Goto

Collaborating Authors

 learning halfspace



Cryptographic Hardness of Learning Halfspaces with Massart Noise

Neural Information Processing Systems

We study the complexity of PAC learning halfspaces in the presence of Massart noise. In this problem, we are given i.i.d. The goal of the learner is to compute a hypothesis with small 0-1 error. Our main result is the first computational hardness result for this learning problem. Specifically, assuming the (widely believed) subexponential-time hardness of the Learning with Errors (LWE) problem, we show that no polynomial-time Massart halfspace learner can achieve error better than \Omega(\eta), even if the optimal 0-1 error is small, namely \mathrm{OPT} 2 {-\log {c} (N)} for any universal constant c \in (0, 1) .


Learning Halfspaces with the Zero-One Loss: Time-Accuracy Tradeoffs

Neural Information Processing Systems

Given \alpha,\epsilon, we study the time complexity required to improperly learn a halfspace with misclassification error rate of at most (1 \alpha)\,L *_\gamma \epsilon, where L *_\gamma is the optimal \gamma -margin error rate. For \alpha 1/\gamma, polynomial time and sample complexity is achievable using the hinge-loss. An immediate question, which this paper tackles, is what is achievable if \alpha \in (0,1/\gamma) . We derive positive results interpolating between the polynomial time for \alpha 1/\gamma and the exponential time for \alpha 0 . In particular, we show that there are cases in which \alpha o(1/\gamma) but the problem is still solvable in polynomial time.


Efficient Active Learning Halfspaces with Tsybakov Noise: A Non-convex Optimization Approach

Li, Yinan, Zhang, Chicheng

arXiv.org Machine Learning

Active learning [Settles, 2009] is a practical machine learning paradigm motivated by the expensiveness of label annotation costs and the wide availability of unlabeled data. Consider the binary classification setting, where given an instance spaceX and a binary label spaceY = { 1,+1} and a data distributionD overX Y, we would like to learn a classifier that accurately predicts the labels of examples drawn from D. As the performance measure of a classifier h, we define its error rate to be err(h):= P


Learning Halfspaces With Membership Queries

Kelner, Ori

arXiv.org Machine Learning

Active learning is a subfield of machine learning, in which the learning algorithm is allowed to choose the data from which it learns. In some cases, it has been shown that active learning can yield an exponential gain in the number of samples the algorithm needs to see, in order to reach generalization error $\leq \epsilon$. In this work we study the problem of learning halfspaces with membership queries. In the membership query scenario, we allow the learning algorithm to ask for the label of every sample in the input space. We suggest a new algorithm for this problem, and prove it achieves a near optimal label complexity in some cases. We also show that the algorithm works well in practice, and significantly outperforms uncertainty sampling.


Learning Halfspaces with the Zero-One Loss: Time-Accuracy Tradeoffs

Birnbaum, Aharon, Shwartz, Shai S.

Neural Information Processing Systems

Given $\alpha,\epsilon$, we study the time complexity required to improperly learn a halfspace with misclassification error rate of at most $(1 \alpha)\,L *_\gamma \epsilon$, where $L *_\gamma$ is the optimal $\gamma$-margin error rate. For $\alpha 1/\gamma$, polynomial time and sample complexity is achievable using the hinge-loss. For $\alpha 0$, \cite{ShalevShSr11} showed that $\poly(1/\gamma)$ time is impossible, while learning is possible in time $\exp(\tilde{O}(1/\gamma))$. An immediate question, which this paper tackles, is what is achievable if $\alpha \in (0,1/\gamma)$. We derive positive results interpolating between the polynomial time for $\alpha 1/\gamma$ and the exponential time for $\alpha 0$.


Learning Halfspaces with Massart Noise Under Structured Distributions

Diakonikolas, Ilias, Kontonis, Vasilis, Tzamos, Christos, Zarifis, Nikos

arXiv.org Machine Learning

We study the problem of learning halfspaces with Massart noise in the distribution-specific PAC model. We give the first computationally efficient algorithm for this problem with respect to a broad family of distributions, including log-concave distributions. This resolves an open question posed in a number of prior works. Our approach is extremely simple: We identify a smooth {\em non-convex} surrogate loss with the property that any approximate stationary point of this loss defines a halfspace that is close to the target halfspace. Given this structural result, we can use SGD to solve the underlying learning problem.